Darstellung der zusammenfassenden Statistik
Humboldt-Universität zu Berlin
2024-01-24
Kapital 27 (A field guide to base R) in Wickham et al. (2023)
course website: Ch. 12: base R
Today we will…
base package which is required to run Rutils and stats (among others)
main goal of base R is stability
the tidyverse is constantly adding, updating, and changing functions
this means that tidyverse code is prone to “breaking”: tidyverse code that runs today might not run in a few years if some functions or arguments have been “deprecated”
What a thing to say when modern R is pretty much synonymous with the tidyverse for many in the community!
— Bodo Winter ((BodoWinter?)) January 10, 2023
I was a base R masochist once too.. but there's no need for statements like this when the tidyverse has helped so many of us be more productive and write more readable code.
[1] "AgeSubject" "Word" "LengthInLetters" "WrittenFrequency"
[5] "WordCategory" "RTlexdec" "RTnaming"
[1] "AgeSubject" "Word" "LengthInLetters" "WrittenFrequency"
[5] "WordCategory" "RTlexdec" "RTnaming"
dplyr verbs# A tibble: 10 × 1
AgeSubject
<chr>
1 young
2 young
3 young
4 young
5 young
6 young
7 young
8 young
9 young
10 young
$) can be used to extract a column from a dataframe (or tibble)dplyr::select() preserves the dataframe/tibble attributes of the column [1] "young" "young" "young" "young" "young" "young" "young" "young" "young"
[10] "young" "young" "young" "young" "young" "young" "young" "young" "young"
dataframe[row,column]
[1] "young" "young" "young" "young" "young" "young" "young" "young" "young"
[10] "young" "young" "young" "young" "young" "young" "young" "young" "young"
[1] "young" "young" "young" "young" "young" "young" "young" "young" "young"
[10] "young" "young" "young" "young" "young" "young" "young" "young" "young"
# A tibble: 10 × 2
AgeSubject RTlexdec
<chr> <dbl>
1 young 695.
2 young 600.
3 young 547.
4 young 617.
5 young 633.
6 young 687.
7 young 584.
8 young 527.
9 young 741.
10 young 536.
c()
AgeSubject RTlexdec
1 young 694.89
2 young 600.40
3 young 547.27
4 young 616.60
5 young 633.08
6 young 686.75
7 young 584.40
8 young 526.82
9 young 741.48
10 young 536.38
filter() function from dplyr
# A tibble: 856 × 7
AgeSubject Word LengthInLetters WrittenFrequency WordCategory RTlexdec
<chr> <chr> <dbl> <dbl> <chr> <dbl>
1 young doe 3 3.91 N 695.
2 young pork 4 5.02 N 617.
3 young prop 4 4.77 N 687.
4 young arc 3 4.89 N 741.
5 young tile 4 4.08 N 647.
6 young slope 5 5.80 N 633.
7 young pith 4 2.48 N 696.
8 young blitz 5 4.19 N 672.
9 young port 4 6.08 N 683.
10 young plan 4 7.46 N 636.
# ℹ 846 more rows
# ℹ 1 more variable: RTnaming <dbl>
[,]
AgeSubject Word LengthInLetters WrittenFrequency WordCategory RTlexdec
1 young doe 3 3.912023 N 694.89
4 young pork 4 5.017280 N 616.60
6 young prop 4 4.770685 N 686.75
9 young arc 3 4.890349 N 741.48
17 young tile 4 4.077537 N 647.07
18 young slope 5 5.802118 N 632.54
22 young pith 4 2.484907 N 695.86
26 young blitz 5 4.189655 N 671.59
29 young port 4 6.084499 N 683.36
34 young plan 4 7.462789 N 636.10
RTnaming
1 466.4
4 460.3
6 477.1
9 453.8
17 459.3
18 476.2
22 473.3
26 469.5
29 459.3
34 470.4
filter() and select() (which we’ve already done before)# A tibble: 10 × 2
AgeSubject RTlexdec
<chr> <dbl>
1 young 695.
2 young 617.
3 young 687.
4 young 741.
5 young 647.
6 young 633.
7 young 696.
8 young 672.
9 young 683.
10 young 636.
[,]
AgeSubject RTlexdec
1 young 694.89
4 young 616.60
6 young 686.75
9 young 741.48
17 young 647.07
18 young 632.54
22 young 695.86
26 young 671.59
29 young 683.36
34 young 636.10
AgeSubject RTlexdec
1 young 694.89
4 young 616.60
6 young 686.75
9 young 741.48
17 young 647.07
18 young 632.54
22 young 695.86
26 young 671.59
29 young 683.36
34 young 636.10
mutate() function from dplyr
# A tibble: 4,568 × 8
AgeSubject Word LengthInLetters WrittenFrequency WordCategory RTlexdec
<chr> <chr> <dbl> <dbl> <chr> <dbl>
1 young doe 3 3.91 N 695.
2 young whore 5 4.52 N 600.
3 young stress 6 6.51 N 547.
4 young pork 4 5.02 N 617.
5 young plug 4 4.89 N 633.
6 young prop 4 4.77 N 687.
7 young dawn 4 6.38 N 584.
8 young dog 3 7.16 N 527.
9 young arc 3 4.89 N 741.
10 young skirt 5 5.93 N 536.
# ℹ 4,558 more rows
# ℹ 2 more variables: RTnaming <dbl>, rt_lexdec_s <dbl>
dataframe$variable) and assign the value with the assignment operator <-
summarise() from dplyr
data.frame() functionggplot2 is popular even among people who don’t use the tidyverse + this is because it has some useful features and a clean lookIn this chapter we will…
Convert the following tidyverse code to base R. We will again use the languageR_english.csv dataset.
# A tibble: 10 × 2
Word WrittenFrequency
<chr> <dbl>
1 doe 3.91
2 whore 4.52
3 stress 6.51
4 pork 5.02
5 plug 4.89
6 prop 4.77
7 dawn 6.38
8 dog 7.16
9 arc 4.89
10 skirt 5.93
# A tibble: 10 × 7
AgeSubject Word LengthInLetters WrittenFrequency WordCategory RTlexdec
<chr> <chr> <dbl> <dbl> <chr> <dbl>
1 young stress 6 6.51 N 547.
2 young dawn 4 6.38 N 584.
3 young dog 3 7.16 N 527.
4 young skirt 5 5.93 N 536.
5 young are 3 11.3 N 611.
6 young pipe 4 6.00 N 563.
7 young guard 5 6.59 N 559.
8 young slope 5 5.80 N 633.
9 young pile 4 6.16 N 595.
10 young tide 4 6.08 N 598.
# ℹ 1 more variable: RTnaming <dbl>
# A tibble: 10 × 3
AgeSubject Word WrittenFrequency
<chr> <chr> <dbl>
1 old stress 6.51
2 old dawn 6.38
3 old dog 7.16
4 old skirt 5.93
5 old are 11.3
6 old pipe 6.00
7 old guard 6.59
8 old slope 5.80
9 old pile 6.16
10 old tide 6.08
What is your impression of base R versus the tidyverse? Based on what you’ve seen, would you prefer one over the other, or would you prefer one in certain cases only? There’s no correct answer here.
Hergestellt mit R version 4.3.0 (2023-04-21) (Already Tomorrow) und RStudioversion 2023.9.0.463 (Desert Sunflower).
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] patchwork_1.1.3 janitor_2.2.0 here_1.0.1 lubridate_1.9.2
[5] forcats_1.0.0 stringr_1.5.0 dplyr_1.1.3 purrr_1.0.2
[9] readr_2.1.4 tidyr_1.3.0 tibble_3.2.1 ggplot2_3.4.3
[13] tidyverse_2.0.0
loaded via a namespace (and not attached):
[1] utf8_1.2.3 generics_0.1.3 stringi_1.7.12 hms_1.1.3
[5] digest_0.6.33 magrittr_2.0.3 evaluate_0.21 grid_4.3.0
[9] timechange_0.2.0 fastmap_1.1.1 rprojroot_2.0.3 jsonlite_1.8.7
[13] fansi_1.0.4 scales_1.2.1 cli_3.6.1 rlang_1.1.1
[17] crayon_1.5.2 bit64_4.0.5 munsell_0.5.0 withr_2.5.0
[21] yaml_2.3.7 tools_4.3.0 parallel_4.3.0 tzdb_0.4.0
[25] colorspace_2.1-0 pacman_0.5.1 vctrs_0.6.3 R6_2.5.1
[29] lifecycle_1.0.3 snakecase_0.11.0 bit_4.0.5 vroom_1.6.3
[33] pkgconfig_2.0.3 pillar_1.9.0 gtable_0.3.4 glue_1.6.2
[37] xfun_0.39 tidyselect_1.2.0 rstudioapi_0.14 knitr_1.44
[41] farver_2.1.1 htmltools_0.5.5 rmarkdown_2.22 labeling_0.4.3
[45] compiler_4.3.0
Woche 10 - Datenvisualisierung 3